Recognition features for Old Slavic letters: Macedonian versus Bosnian alphabet
نویسندگان
چکیده
This paper compares recognition methods for two Old Slavic Cyrillic alphabets: Macedonian and Bosnian. Two novel methodologies for recognition of Old Macedonian letters are already implemented and experimentally tested by calculating their recognition accuracy and precision. The first method is based on a decision tree classifier realized by a set of rules and the second one is based on a fuzzy classifier. To enhance the performance of the decision tree classifier the extracted rules are corrected according to their accuracy and coverage. The fuzzy classifier consists of rules constructed by fuzzy aggregation of letter features. Both classifiers use the same set of discriminative features, such as number and position of spots in outer segments, presence and position of horizontal and vertical lines and holes, compactness and symmetry. We argue that the same feature set can be used for recognition of Old Bosnian letters. Moreover, due to the similarity of the graphemes and fewer letters we expect better efficiency of the recognition system for Bosnian letters.
منابع مشابه
Speech Technologies for Serbian and Kindred South Slavic Languages
This chapter will present the results of the research and development of speech technologies for Serbian and other kindred South Slavic languages used in five countries of the Western Balkans, carried out by the University of Novi Sad, Serbia in cooperation with the company AlfaNum. The first section will describe particularities of highly inflected languages (such as Serbian and other language...
متن کاملToward Pan-Slavic NLP: Some Experiments with Language Adaptation
There is great variation in the amount of NLP resources available for Slavic languages. For example, the Universal Dependency treebank (Nivre et al., 2016) has about 2 MW of training resources for Czech, more than 1 MW for Russian, while only 950 words for Ukrainian and nothing for Belorussian, Bosnian or Macedonian. Similarly, the Autodesk Machine Translation dataset only covers three Slavic l...
متن کاملSpeech recognition for east Slavic languages: the case of Russian
In this paper, we present a survey of state-of-the-art systems for automatic processing of recognition of under-resourced languages of the Eastern Europe, in particular, East Slavic languages (Ukrainian, Belarusian and Russian), which share some common prominent features including Cyrillic alphabet, phonetic classes, morphological structure of wordforms and relatively free grammar. A large voca...
متن کاملAre Pitch Contour and Quantity Independent Distinctive Features in Bosnian Serbian?
This study investigates the independent phonological status of tonal and quantity contrasts in Bosnian Serbian, a Southern Slavic pitch accent language. Accents in Bosnian Serbian are characterised by falling vs. rising contours and long and short quantity, leading to four different accent types. Previous research suggests that pure tonal contrasts are hard to be distinguished on acoustic and p...
متن کاملSerbo-croatian Hyphenation: a 'ijex Point of View
Serbo-Croatian is one of the South-Slavic languages. It is characterized, as other Slavic languages, by a rich morphology. A particular feature of the language is its almost fully phonological orthography, i.e. on a word level, one letter corresponds to each phoneme and vice versa. As a result, the written text practically represents a phonemic transcription of speech. Still, the Serbo-Croatian...
متن کامل